Developing crossmodal expression recognition based on a deep neural model
نویسندگان
چکیده
A robot capable of understanding emotion expressions can increase its own capability of solving problems by using emotion expressions as part of its own decision-making, in a similar way to humans. Evidence shows that the perception of human interaction starts with an innate perception mechanism, where the interaction between different entities is perceived and categorized into two very clear directions: positive or negative. While the person is developing during childhood, the perception evolves and is shaped based on the observation of human interaction, creating the capability to learn different categories of expressions. In the context of human-robot interaction, we propose a model that simulates the innate perception of audio-visual emotion expressions with deep neural networks, that learns new expressions by categorizing them into emotional clusters with a self-organizing layer. The proposed model is evaluated with three different corpora: The Surrey Audio-Visual Expressed Emotion (SAVEE) database, the visual Bi-modal Face and Body benchmark (FABO) database, and the multimodal corpus of the Emotion Recognition in the Wild (EmotiW) challenge. We use these corpora to evaluate the performance of the model to recognize emotional expressions, and compare it to state-of-the-art research.
منابع مشابه
بهبود مدل تفکیککننده منیفلدهای غیرخطی بهمنظور بازشناسی چهره با یک تصویر از هر فرد
Manifold learning is a dimension reduction method for extracting nonlinear structures of high-dimensional data. Many methods have been introduced for this purpose. Most of these methods usually extract a global manifold for data. However, in many real-world problems, there is not only one global manifold, but also additional information about the objects is shared by a large number of manifolds...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کامل